Search results for "Algorithms and data structures"

showing 3 items of 3 documents

Epigenomic k-mer dictionaries: shedding light on how sequence composition influences in vivo nucleosome positioning

2014

Abstract Motivation: Information-theoretic and compositional analysis of biological sequences, in terms of k-mer dictionaries, has a well established role in genomic and proteomic studies. Much less so in epigenomics, although the role of k-mers in chromatin organization and nucleosome positioning is particularly relevant. Fundamental questions concerning the informational content and compositional structure of nucleosome favouring and disfavoring sequences with respect to their basic building blocks still remain open. Results: We present the first analysis on the role of k-mers in the composition of nucleosome enriched and depleted genomic regions (NER and NDR for short) that is: (i) exhau…

EpigenomicsStatistics and ProbabilityGeneticsSupplementary dataSequenceGenomeSettore INF/01 - InformaticaSequence Analysis DNAComputational biologyAlgorithms and Data Structures BioinformaticsBiologyChromatin Assembly and DisassemblyBiochemistryNucleosomesComputer Science ApplicationsComputational MathematicsComputational Theory and Mathematicsk-merAnimalsHumansNucleosomeMolecular BiologyComposition (language)Epigenomics
researchProduct

Algorithmic paradigms for stability-based cluster validity and model selection statistical methods, with applications to microarray data analysis

2012

AbstractThe advent of high throughput technologies, in particular microarrays, for biological research has revived interest in clustering, resulting in a plethora of new clustering algorithms. However, model selection, i.e., the identification of the correct number of clusters in a dataset, has received relatively little attention. Indeed, although central for statistics, its difficulty is also well known. Fortunately, a few novel techniques for model selection, representing a sharp departure from previous ones in statistics, have been proposed and gained prominence for microarray data analysis. Among those, the stability-based methods are the most robust and best performing in terms of pre…

Settore INF/01 - InformaticaGeneral Computer Sciencebusiness.industryComputer scienceBioinformaticsModel selectionGeneral statisticsMachine learningcomputer.software_genreTheoretical Computer ScienceComputational biologyAnalysis of massive datasetsMachine learningCluster (physics)Algorithms and data structures General statistics Analysis of massive datasets Machine learning Computational biology BioinformaticsAlgorithms and data structuresAlgorithm designArtificial intelligenceCluster analysisbusinessCompleteness (statistics)computerComputer Science(all)Theoretical Computer Science
researchProduct

Stability-Based Model Selection for High Throughput Genomic Data: An Algorithmic Paradigm

2012

Clustering is one of the most well known activities in scien- tific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. In this beautiful area, one of the most difficult challenges is the model selection problem, i.e., the identifi- cation of the correct number of clusters in a dataset. In the last decade, a few novel techniques for model selection, representing a sharp departure from previous ones in statistics, have been proposed and gained promi- nence for microarray data analysis. Among those, the stability-based methods are the most robust and best performing in terms of predic- tion, but the slowest in terms of time. Unfortunately…

Class (computer programming)Settore INF/01 - Informaticabusiness.industryComputer scienceHeuristic (computer science)Model selectionStability (learning theory)Machine learningcomputer.software_genreIdentification (information)Algorithm designArtificial intelligenceCluster analysisbusinessAlgorithms and Data StructuresThroughput (business)computer
researchProduct